Identification of a Writer’s Native Language by Error Analysis

نویسنده

  • Ekaterina Kochmar
چکیده

This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration except where specifically indicated in the text. This dissertation does not exceed the regulation length of 15, 000 words, including tables and footnotes. Summary In this project, we investigate the task of native language identification. We study a set of Indo-European languages, and demonstrate how machine learning techniques can be used to identify native language of a text's author. A number of different features are extracted and applied to this task. Their contribution to overall performance is investigated and reported. We explore the hypotheses that the choice of words in a free text is influenced by a writer's native language, and that the errors committed by a writer are based on the differences between the writer's native language system and that of English. We identify the error types typical for speakers of different native languages, and show how using different features based on the discriminative error types can improve classification. Acknowledgments I would like to thank my supervisor, Prof. Ted Briscoe, for his guidance and constant support. I am grateful for his encouragement and valuable suggestions throughout the course of this work. I would also like to thank Helen Yannakoudakis and Øistein Andersen for their much appreciated help and their ability to identify the weak spots in my work and offer suggestions for improvement.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Looking for Low-proficiency Sentences in ELL Writing

Determining whether an author is writing in their native language (L1) or a second language (L2) is a problem that lies at the intersection of four traditional NLP tasks: native language identification, similar language identification, detecting translationese, and grammatical error correction. In general, the goal of the language learner is to improve their proficiency until their writing is i...

متن کامل

Manifest Destiny and American Identity in Cormac McCarthy’s Blood Meridian

McCarthy scholarship has predominantly tended to stress the writer’s revisionism with regard to his rendering of the myth of the American West in Blood Meridian (1985). McCarthy’s novel has beenmainlyhailed as a critique of the violence of manifest destiny. This study aims to delineate aspects of McCarthy’s narrative which resist the predominant view of him as a revisionist. In this re...

متن کامل

Chinese Native Language Identification

We present the first application of Native Language Identification (NLI) to nonEnglish data. Motivated by theories of language transfer, NLI is the task of identifying a writer’s native language (L1) based on their writings in a second language (the L2). An NLI system was applied to Chinese learner texts using topicindependent syntactic models to assess their accuracy. We find that models using...

متن کامل

Norwegian Native Language Identification

We present a study of Native Language Identification (NLI) using data from learners of Norwegian, a language not yet used for this task. NLI is the task of predicting a writer’s first language using only their writings in a learned language. We find that three feature types, function words, part-of-speech n-grams and a hybrid part-of-speech/function word mixture n-gram model are useful here. Ou...

متن کامل

Use of Articles in Learning English as a Foreign Language: A Study of Iranian English Undergraduates

The significance of error analysis for the learner, the teacher and the researcher is now widely recognized. Earlier studies of error analysis concentrated on intersystematic comparison of the “native language” and the “target language” and drew the required data largely from intuitions and impressionistic observations. This study was conducted on the basis of the following observations: (1) to...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011